The Essential Dynamics Algorithm: Fast Policy Search In Continuous Worlds
نویسنده
چکیده
This paper presents a novel algorithm for learning in a class of stochastic Markov decision processes (MDPs) with continuous state and action spaces that trades speed for accuracy. The algorithm can be seen as a generalization of linear quadratic control to nonlinear, non-regulation problems. A transform is presented of the stochastic MDP into a deterministic one which captures the essence of the original dynamics, in a sense made precise. In this transformed MDP, the calculation of values is greatly simplified. The online algorithm estimates the model of the transformed MDP and simultaneously does policy search against it. Bounds on the error of this approximation are proven, and experimental results are presented in both a bicycle riding domain and the control of a robot arm on a dynamic base, a 14 dimensional state space. The algorithm learns near optimal policies in orders of magnitude fewer interactions with the stochastic MDP, using less domain knowledge. Code is available on the project’s web site.
منابع مشابه
Controlling Cardea: Fast Policy Search in a High Dimensional Space
The essential dynamics algorithm is a novel policy search algorithm for learning in a class of stochastic Markov decision processes (MDPs) with continuous state and action spaces. We apply it to the control of a 5 degree of freedom robot arm atop a Segway base. Movement of the arm causes the base to translate and tilt, which in turn affects the movement of the arm. The state space has 14 dimens...
متن کاملAdaptive search area for fast motion estimation
In this paper a new method for determining the search area for motion estimation algorithm based on block matching is suggested. In the proposed method the search area is adaptively found for each block of a frame. This search area is similar to that of the full search (FS) algorithm but smaller for most blocks of a frame. Therefore, the proposed algorithm is analogous to FS in terms of reg...
متن کاملDISCRETE SIZE AND DISCRETE-CONTINUOUS CONFIGURATION OPTIMIZATION METHODS FOR TRUSS STRUCTURES USING THE HARMONY SEARCH ALGORITHM
Many methods have been developed for structural size and configuration optimization in which cross-sectional areas are usually assumed to be continuous. In most practical structural engineering design problems, however, the design variables are discrete. This paper proposes two efficient structural optimization methods based on the harmony search (HS) heuristic algorithm that treat both discret...
متن کاملGravitational Search Algorithm to Solve the K-of-N Lifetime Problem in Two-Tiered WSNs
Wireless Sensor Networks (WSNs) are networks of autonomous nodes used for monitoring an environment. In designing WSNs, one of the main issues is limited energy source for each sensor node. Hence, offering ways to optimize energy consumption in WSNs which eventually increases the network lifetime is strongly felt. Gravitational Search Algorithm (GSA) is a novel stochastic population-based meta-...
متن کاملA Novel Continuous KNN Prediction Algorithm to Improve Manufacturing Policies in a VMI Supply Chain
This paper examines and compares various manufacturing policies which manufacturer may adopt so as to improve the performance of a vendor managed inventory (VMI) partnership. The goal is to maximize the combined cumulative profit of supply chain while minimizing relevant inventory management costs. The supply chain is a two-level system with a single manufacturer and single retailer at each lev...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004